Guessing morphological classes of unknown German nouns
نویسندگان
چکیده
A system for recognition and morphological classification of unknown German words is described. Given raw texts it outputs a list of the unknown nouns together with hypotheses about their possible stems and morphological class(es). The system exploits both global and local information as well as morphological properties and external linguistic knowledge sources. It learns and applies ending-guessing rules similar to the ones originally proposed for POS guessing. The paper presents the system design and implementation and discusses its performance by extensive evaluation. Similar ideas for ending-guessing rules have been applied to Bulgarian as well but the performance is worse due to the difficulties of noun recognition as well as to the highly inflexional morphology with numerous ambiguous endings.
منابع مشابه
MorphoClass - Recognition and Morphological Classification of Unknown Words for German
A system for recognition and morphological classification of unknown words for German is described and evaluated. It takes raw text as input and outputs a list of the unknown nouns together with a hypothesis about their possible morphological class and stem. MorphoClass exploits global information (ending-guessing rules, maximum likelihood estimations, word frequency statistics), morphological ...
متن کاملA Corpus-based Approach to the Interpretation of Unknown Words with an Application to German
Abstract Usually a high portion of the different word forms in a corpus receive no reading by the lexical and/or morphological analysis. These unknown words constitute a huge problem for NLP analysis tasks like POS-tagging or syntactic parsing. We present a parameterizable (in principle language-independent) corpus-based approach for the interpretation of unknown words that only needs a tokeniz...
متن کاملMorphological features help POS tagging of unknown words across language varieties
Part-of-speech tagging, like any supervised statistical NLP task, is more difficult when test sets are very different from training sets, for example when tagging across genres or language varieties. We examined the problem of POS tagging of different varieties of Mandarin Chinese (PRC-Mainland, PRCHong Kong, and Taiwan). An analytic study first showed that unknown words were a major source of ...
متن کاملAutomatic Rule Induction for Unknown-Word Guessing
Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose l...
متن کاملUnsupervised Learning of Word-Category Guessing Rules
Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised statistical acquisition of rules which guess possible partsof-speech for unknown words. Three complementary sets of word-guessing rules are induced from the lexicon and a raw corpus: prefix morphological rules, suffix morphological rules and ending-gu...
متن کامل